Global repair is the primary nucleotide excision repair subpathway for the removal of pyrimidine-pyrimidone (6-4) damage from the Arabidopsis genome

Ultraviolet (UV) component of solar radiation impairs genome stability by inducing the formation of pyrimidine-pyrimidone (6-4) photoproducts [(6-4)PPs] in plant genomes. (6-4)PPs disrupt growth and development by interfering with transcription and DNA replication. To resist UV stress, plants employ both photoreactivation and nucleotide excision repair that excises oligonucleotide containing (6-4)PPs through two subpathways: global and transcription-coupled excision repair (TCR). Here, we analyzed the genome-wide excision repair-mediated repair of (6-4)PPs in Arabidopsis thaliana and found that (6-4)PPs can be repaired by TCR; however, the main subpathway to remove (6-4)PPs from the genome is global repair. Our analysis showed that open chromatin genome regions are more rapidly repaired than heterochromatin regions, and the repair level peaks at the promoter, transcription start site and transcription end site of genes. Our study revealed that the repair of (6-4)PP in plants showed a distinct genome-wide repair profile compared to the repair of other major UV-induced DNA lesion called cyclobutane pyrimidine dimers (CPDs).

A (RPA) and Xeroderma pigmentosum complementation group C (XPC) proteins.Transcription Factor IIH (TFIIH) complex is recruited to the damage site, followed by DNA strand incision with the help of Xeroderma Pigmentosum complementation group F (XPF)-Excision Repair Cross-Complementation group 1 (ERCC1) and Xeroderma Pigmentosum Group G (XPG) endonucleases.The damaged DNA strand is then replaced by polymerase and ligase activities, a process known as global repair, the subpathway of excision repair removing UV damage from all genome regions.If UV lesions occur on the transcribed strands of genes, elongating RNA polymerase II is blocked at the damage site during transcription.Cockayne syndrome group A (CSA) and CSB proteins detect stalled RNA polymerase II and recruit TFIIH complex and endonucleases, known as transcription-coupled repair (TCR).In Arabidopsis, transcription-coupled repair (TCR) depends on the CSA1 protein, which has been shown to interact with its homolog, CSA2 protein.However, CSA2 has a minor influence on TCR [13][14][15] .CSA1 is involved in CUL4-DDB1A CSA1 E3 ligase in plants 14 .Similarly, DDB2 interacts with the CUL4-DDB1A complex to form another E3 ligase playing a role in global repair under UV-induced stress 16 .It has been revealed that Arabidopsis plants deficient in DDB2 exhibit higher sensitivity to UV radiation compared to wild type 17 , while overexpression of Arabidopsis DDB1A enhances the plant's tolerance to UV exposure 18 .
Transcription rate across the genome mainly influences excision repair-mediated CPD repair in Arabidopsis.CPD repair level is about five times higher in the transcribed strands of genes compared to the non-transcribed strands, indicating the strong dominance of TCR in CPD repair.Additionally, the chromatin state impacts the repair of CPD lesions through excision repair, as repair is less efficient in heterochromatic regions than open chromatin regions.Furthermore, the removal of CPDs by TCR, but not global repair, shows oscillations throughout the day 19 .While the excision repair-mediated CPD repair in Arabidopsis has been extensively studied, there is still a lack of comprehensive knowledge regarding the genome-wide dynamics of (6-4)PP repair in plants.
The excision repair-sequencing (XR-seq) allows for the generation of a genome-wide profile of excision repair, offering precise information on the repair of specific DNA lesions at single-nucleotide resolution and at a particular timepoint 20 .By XR-seq, it becomes possible to assess the level of TCR in a specific gene by analyzing the variation in repair signals between the transcribed and non-transcribed strands of genes.XR-seq method involves immunoprecipitation of the excised DNA oligonucleotides containing the lesions (excision products) through the use of lesion-specific monoclonal antibodies.These excision products are then characterized by next-generation sequencing, and the resulting reads are aligned to the reference genome to determine the sites of excision repair activity.Here, we generated and examined the genome-wide repair map of (6-4)PPs in Arabidopsis seedlings by XR-seq.We found that (6-4)PPs can be repaired by TCR.However, the level of TCR during (6-4)PP repair is lower compared to CPD repair.The repair of (6-4)PP lesions exhibited distinct peaks at the promoter, transcription start site (TSS) and transcription end site (TES) of genes.Similar to CPD repair, we found that the rate of (6-4)PP repair is influenced by the chromatin state, with more efficient repair in open chromatin regions compared to regions associated with heterochromatin.Our study revealed that excision repair-mediated removal of pyrimidine-pyrimidone (6-4) damage on the Arabidopsis genome is mainly through global repair.

Results
To understand the dynamics of excision repair-mediated (6-4)PP repair, we irradiated 10-days-old Arabidopsis seedlings with UVC (254 nm, 120 J/m 2 ) and performed time course analysis of (6-4)PP-containing excision product formation at 5-, 15-and 30-min following UV irradiation.It is noteworthy that the (6-4)PP lesions on the genome were only repaired by excision repair, as we interfered with blue-light-dependent photoreactivation by keeping the seedlings under yellow light after UV irradiation.We detected generation of primary excision products, ranging from 24 to 27 nucleotides (nt) in size, at the 15-min and 30-min time points (Fig. S1A).Besides, we observed a population of short-sized excision products, approximately 18 nt in length, formed by degradation of primary excision products from the 5' end 19 .A previous study detected (6-4)PP repair activity in T87 Arabidopsis cell suspension culture 30 min after UV exposure 21 , which differs from our findings in seedlings.This difference may be caused by distinct excision repair kinetics of cell types in Arabidopsis 22 .
To create genome-wide (6-4)PP excision repair map of Arabidopsis seedlings, we performed XR-seq 15 min after UV irradiation.Briefly, we immunoprecipitated the 6-4PP-containing excision products, linked adapters to them by ligation and amplified them by PCR to generate the library (Fig. S1B).We further sequenced the libraries to obtain the reads representing excision products and aligned these reads to the Arabidopsis thaliana genome 23 to map the sites of excision repair-mediated (6-4)PP repair across the genome.Our results exhibited that two different populations of excision products in terms of length were generated during 6-4PP repair, consistent with the results of in vivo excision assay.The primary excision products were 25-28 nucleotides in length, with a peak at 27 and a population of shorter-sized excision products (16-21 nt) were present (Fig. 1A).Our analysis demonstrated that the length distribution of (6-4)PP-containing excision products are correlated with the excision products with CPD lesions 19 .Moreover, the nucleotide frequency distribution of 27-nt-long excision products revealed that the positions of 7-8 nt from the 3' ends are pyrimidine-rich, consistent with other eukaryotes (Fig. 1B).
The excision products with (6-4)PP showed a higher frequency of cytosines at the damage site compared to CPD-containing excision products.TT and TC dipyrimidines dominated the 20th and 21st positions on the XRseq reads with 26.1% and 24.2% abundance, respectively while the third most prevalent dipyrimidine was CT with 15.9% abundance.On the other hand, TT was by far the most dominant dipyrimidine with 42.7% abundance in the same positions of CPD XR-seq data (ZT20) reads whereas CT and TC were only 4.7% and 11.5% abundant, respectively.While UV irradiation induces the formation of CPD photoproducts mainly at the TT sites 19 , 6-4PPs are formed at a higher frequency at CC, CT and TC sites in addition to TT sites.
To investigate whether Arabidopsis plants employ transcription-coupled repair (TCR) to remove (6-4)PP lesions, we compared repair levels in the transcribed strand (TS) and non-transcribed strand (NTS) of annotated genes.In addition, we included simulated XR-seq data which contained synthetic reads selected randomly from the genome considering the total count and sequence content of the real XR-seq data reads 24 .By normalizing the repair with the simulated repair, we obtained the normalized repair rates and eliminated the sequence content bias that might affect the repair profiles.Analyzing TS and NTS repair differences across all annotated genes revealed a slightly higher repair level in TS compared to NTS, indicating the involvement of TCR in (6-4)PP repair (Fig. 2A, B).Our results showed that repair levels at TS and NTS exhibited a peak at the transcription start site (TSS).We also detected another peak in the promoter region of genes in both TS and NTS.At the transcription end site (TES), peaks were identified in both TS and NTS, albeit with distinct positions.However, the level of TCR activity during (6-4)PP repair was lower than during CPD repair that was demonstrated in our previous study 19 .As an example, we calculated the TS/NTS ratios and determined that TS was repaired more efficiently than NTS in genes such as AT2G23430 (ICK1) and AT2G23420 (NAPRT2), as depicted in the genomic view (Fig. 2C).Next, we checked the correlation between the transcription levels and repair rates.To do that, we used RNA-seq data to calculate the expression levels (TPMs) of the genes 25 and compared them with the TS/ NTS repair rates of each gene.The results revealed a correlation between TCR level and transcription rate during (6-4)PP repair (Fig. 2D).However, this correlation was not as pronounced as observed in the CPD repair 19 .
The difference between the repair levels on TS and NTS provides insights about whether the repair at that time-point was dominated by TCR.Therefore, we compared the distribution of TS/NTS repair ratios on genes between (6-4)PP and CPD data (Fig. 3A).The repair of CPD damage has higher TS/NTS repair rates on most of the genes while TS/NTS repair ratios of (6-4)PP damage was slightly higher than 1 on most of the genes.This indicated the lower activity of TCR in (6-4)PP repair at 15-min while it is more active in CPD repair at 30-min.The screenshots of the XR-seq read distributions on two genes supported the previous observation (Fig. 3B).The (6-4)PP repair showed only a slight difference between TS and NTS while CPD repair was higher on the TS of both genes.The comparison of (6-4)PP and CPD repair profiles on Arabidopsis genes also revealed differences between the repair of these two damage types (Fig. 3C).On the genes, the repair rates of both damage types showed strand difference.However, this strand difference was much more evident in CPD repair than in (6-4)PP repair, indicating that the TCR activity was lower in (6-4)PP repair at 15-min while it is more active in CPD repair at 30-min.
Furthermore, we examined repair levels across different genomic elements identified in Arabidopsis genome 26 .We observed that AT-rich and GC-rich heterochromatin regions of the genome exhibited slower repair compared to distal regulatory intergenic genomic regions with euchromatin signatures (Fig. 4), indicating that the www.nature.com/scientificreports/epigenetic state influences the rate of (6-4)PP repair.We found similar (6-4)PP repair levels in transcribed regions, such as the 5' and 3' ends of genes, unlike CPD repair, indicating that the repair level difference between the 5' and 3' ends of genes in CPD repair is due to strong TCR.In (6-4)PP repair, where TCR is less prominent, this difference diminishes.In conclusion, for (6-4)PP repair, global repair mechanisms play a more significant role than TCR.The main factor determining (6-4)PP repair is the epigenetic state, while transcription rate plays a more significant role in CPD repair 19 .

Discussion
As sessile organisms, plants are exposed to the UV component of sunlight during photosynthesis.While efficiently capturing blue and red light by chlorophylls, they must protect their genome from UV radiation.Thanks to the stratospheric ozone layer, only UV-A (320-400 nm) and UV-B (280-320 nm) reach the Earth's surface, UV-C (100-280 nm), the most damaging type, is blocked.UV-B causes the formation of UV photoproducts on plant's genome 9 .Plants possess flavonoids and carotenoids to block UV penetration into deep tissues and eliminate the oxidative stress caused by UV exposure [27][28][29] .In order to maintain their genome integrity, plants employ two DNA repair mechanisms-photoreactivation, active only during the day due to its dependence on blue light, and nucleotide excision repair, functional all day.This all-day functionality of excision repair presumably becomes important for removing UV damage occurring at dusk.Since photolyases rely on blue light for their functionality, therefore at dusk there is a likelihood that photolyases may be incapable of eliminating UV damage accumulated prior to this period.As a result, excision repair becomes essential at such timepoints to ensure the maintenance of genome stability.
Our previous analysis on CPD excision repair demonstrated a high level of TCR, meaning the excision repair is preferentially more active at transcribed strands of genes.The correlation between transcription rate

Figure 1 .
Figure 1.Length distribution and dinucleotide content of XR-seq reads.(A) Length distribution of XR-seq reads.(B) Distribution of CC, CT, TC and TT dinucleotides along the 27-nucleotide-long XR-seq reads.

Figure 2 .
Figure 2. Repair profiles on Arabidopsis genome.(A) Normalized TS and NTS repair profiles on Arabidopsis genes and their flanking regions.Normalization of TS and NTS repair was performed by dividing the XR-seq read abundance on TS and NTS by the simulated XR-seq data abundance separately on the same genes and strands, after RPKM normalization on each genomic window.Upstream and downstream flanking regions were determined as the half of the length of its representative gene.(B) Normalized TS and NTS repair levels on Arabidopsis genes.Normalized repair was calculated as in (A) on each gene and shown in log2 scale where 0.0 represented the expected normalized repair levels due to the sequence content of the genes.(C) IGV screenshot of XR-seq and simulated XR-seq read distribution on two genes.TS/NTS and normalized TS/NTS ratios were shown in the below section.Normalization of XR-seq data abundance was performed as in (B) to obtain normalized TS/NTS repair ratios.(D) Correlation between normalized TS/NTS repair with the expression of the genes.TPM normalization was performed on RNA-seq data.Normalized TS/NTS ratios were obtained as in (B) and (C) and shown in log2 scale.P-value and correlation coefficient were calculated using Pearson correlation.Colors represent the neighboring data points.

Figure 3 .
Figure 3.Comparison of (6-4)PP and CPD damage repair profiles on Arabidopsis genome.(A) Distribution of normalized TS/NTS repair ratios on Arabidopsis genes.Normalization of TS and NTS repair was performed by dividing the XR-seq read abundance on TS and NTS by the simulated XR-seq data abundance separately on each gene and strand, after RPKM normalization.(B) IGV screenshot of XR-seq read distribution for (6-4)PP and CPD damage types on two genes.(C) Normalized TS and NTS repair profiles on Arabidopsis genes and their flanking regions for (6-4)PP and CPD damage types.Normalization of TS and NTS repair was performed by dividing the XR-seq read abundance on TS and NTS by the simulated XR-seq data abundance separately on the same genes and strands, after RPKM normalization on each genomic window.Upstream and downstream flanking regions were determined as the half of the length of its representative gene.